Making Test Corpora for Question Answering More Representative
نویسندگان
چکیده
Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.
منابع مشابه
ارایه یک پیکره پرسش و پاسخ مذهبی در زبان فارسی
Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...
متن کاملAnalysis of Wikipedia-based Corpora for Question Answering
This paper gives comprehensive analyses of corpora based on Wikipedia for several tasks in question answering. Four recent corpora are collected, WIKIQA, SELQA, SQUAD, and INFOBOXQA, and first analyzed intrinsically by contextual similarities, question types, and answer categories. These corpora are then analyzed extrinsically by three question answering tasks, answer retrieval, selection, and ...
متن کاملReflections on TREC QA
The TREC (later TAC) Question Answering track reinvigorated the question answering research community, fostering extensive research on different question types and finding answers in different kinds of corpora. Parallel evaluations extended the research further to include a variety of languages and media types. In recent years, the TAC QA track evolved into the Knowledge Base Population (KBP) t...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملImproving Question Answering for Reading Comprehension Tests by Combining Multiple Systems
Most work on reading comprehension question answering systems has focused on improving performance by adding complex natural language processing (NLP) components to such systems rather than by combining the output of multiple systems. Our paper empirically evaluates whether combining the outputs of seven such systems submitted as the final projects for a graduate level class can improve over th...
متن کامل